Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination
نویسندگان
چکیده
We present block algorithms and their implementation for the parallelization of sub-cubic Gaussian elimination on shared memory architectures. Contrarily to the classical cubic algorithms in parallel numerical linear algebra, we focus here on recursive algorithms and coarse grain parallelization. Indeed, sub-cubic matrix arithmetic can only be achieved through recursive algorithms making coarse grain block algorithms perform more efficiently than fine grain ones. This work is motivated by the design and implementation of dense linear algebra over a finite field, where fast matrix multiplication is used extensively and where costly modular reductions also advocate for coarse grain block decomposition. We incrementally build efficient kernels, for matrix multiplication first, then triangular system solving, on top of which a recursive PLUQ decomposition algorithm is built. We study the parallelization of these kernels using several algorithmic variants: either iterative or recursive and using different splitting strategies. Experiments show that recursive adaptive methods for matrix multiplication, hybrid recursive-iterative methods for triangular system solve and tile recursive versions of the PLUQ decomposition, together with various data mapping policies, provide the best performance on a 32 cores NUMA architecture. Overall, we show that the overhead of modular reductions is more than compensated by the fast linear algebra algorithms and that exact dense linear algebra matches the performance of full rank reference numerical software even in the presence of rank deficiencies.
منابع مشابه
Prototyping Parallel LAPACK using Block-Cyclic Distributed BLAS
Given an implementation of Distributed BLAS Level 3 kernels, the parallelization of dense linear algebra libraries such as LAPACK can be easily achieved. In this paper, we brieey describe the implementation and performance on the AP1000 of Distributed BLAS Level 3 for the rectangular r s block-cyclic matrix distribution. Then, the parallelization of the central matrix factorization and the trid...
متن کاملApplication of Fortran Pthreads to Linear Algebra and Scientific Computing
Pthreads is a POSIX standard library for expressing concurrency on uniprocessor and symmetric multiprocessor computers. Typical multithreaded applications include database manipulation, operating systems, or any algorithm displaying task-level concurrency. These types of programs are generally coded in C. Hence, the POSIX standard only defines a C interface to Pthreads. The lack of a standard F...
متن کاملFast Solvers for Dense Linear Systems
It appears that large scale calculations in particle physics often require to solve systems of linear equations with rational number coefficients exactly. If classical Gaussian elimination is applied to a dense system, the time needed to solve such a system grows exponentially in the size of the system. In this tutorial paper, we present a standard technique from computer algebra that avoids th...
متن کاملPractical Task-Oriented Parallelism for Gaussian Elimination in Distributed Memory
This paper discusses a methodology for easily and efficiently parallelizing sequential algorithms in linear algebra using cost-effective networks of workstations, where the algorithm lends itself to parallelism. A particular target architecture of interest is the academic student laboratory, which typically contains many networked computers that lay idle at night. A case is made for why a task-...
متن کاملAnalysis Of Exact Solution Of Linear Equation Systems Over Rational Numbers By Parallel p-adic Arithmetic
We study and investigate the p-adic arithmetic along with analysis of exact solution of linear equation systems over rational numbers. Initially we study the basic concepts involving the p-adic numbers and why they form a better representation. After that we describe a parallel implementation of an algorithm for solving systems of linear equations over the field of rational number based on the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 57 شماره
صفحات -
تاریخ انتشار 2016